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In this paper, we study a kernel smoothing approach for denoising a tensor field. Particularly, 
both simulation studies and theoretical analysis are conducted to understand the effects of the 
noise structure and the structure of the tensor field on the performance of different smoothers 
arising from using different metrics, viz., Euclidean, log-Euclidean and affine invariant metrics. 
■ We also study the Rician noise model and compare two regression estimators of diffusion tensors 

based on raw diffusion weighted imaging data at each voxel. 
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1 Introduction 



. In this paper, we discuss kernel smoothing of tensor fields by assuming and utilizing spatial smooth- 

ed \ ness of the field. The primary goal is to reduce the noise from the "observed" tensors by combining 
information from spatial neighborhoods. Here tensors refer to symmetric positive definite matrices. 
They appear in many contexts, such as fluid dynamics, robotics, tensor-based morphometry, etc. 
See Cammoun et al. (2009) for a review on applications of tensors. One of the most important 
applications of tensors is in medical imaging. In diffusion weighted magnetic resonance imaging 
(DT-MRI), diffusion of water molecules is measured along a pre-specified set of gradient directions 
(cf. Mori, 2007). The mobility of water molecules in neuronal tissues shows an anisotropy in the 
diffusion (cf. Le Bihan et al., 2001). In white matter, this diffusion anisotropy originates from the 
j_j ■ faster diffusion of water molecules through white matter fiber tracts. Therefore, white matter fiber 



tracts can by mapped in a non-invasive way through DT-MRI by revealing the pattern of diffusion. 
Indeed, DT-MRI has emerged as among the most important tools in understanding the structure of 
the brain. If the diffusion of water molecules is modeled by a 3-dimensional Brownian motion, then 
the covariance matrices of this Brownian motion characterize the diffusion at the corresponding 
locations of the brain and thus they are called diffusion tensors (see Section [5] for more details 
on a model for diffusion weighted imaging data). DT-MRI data are often subject to a substantial 
amount of noise, which results in a noisy tensor field (Gudbjartsson and Patz, 1995; Hahn et al., 
2006, 2009; Zhu et al., 2007, 2009). This may adversely affect the mapping of white matter fiber 
tracts (the process is known as tractography) and other applications relying on diffusion tensors as 
input data, such as measuring the anisotropy, morphometry etc. (Basser and Pajevic, 2000; Zhu et 
al., 2009). On the other hand, the physical continuity of the white matter implies a certain degree 
of spatial smoothness of the tensor field, which can be utilized to denoise the data. 
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One of the most commonly used techniques for denoising data measured at different spatial 
locations (or different time points) is through weighted averages over spatial (or temporal) neigh- 
borhoods. The weights are often determined by a kernel function and thus this approach is referred 
to as kernel smoothing. In this paper, we extend kernel smoothing techniques to the tensor space. 
Since the concept of averaging depends on the geometry of the space, we study the impacts of 
geometry on tensor smoothing. In particular, we consider three geometries on the tensor space: Eu- 
clidean geometry, log-Euclidean geometry and the affine invariant geometry. Under the Euclidean 
geometry, the average of the tensors is simply the (ordinary) arithmetic mean. The log-Euclidean 
geometry on a tensor space is proposed by Arsigny et al. (2005, 2006), where the averaging amounts 
to taking Euclidean average of the logarithm of the tensors followed by exponentiation of the re- 
sulting matrix. The affine-invariant geometry corresponds to the canonical Riemannian metric on 
a tensor space treated as a homogeneous space (Nomizu, 1954). In context of DT-MRI, it has been 
studied by Forstner and Moonen (1999), Fletcher and Joshi (2004, 2007), Pennec et al. (2006). 
There is no closed form formula for the affine invariant average (see Section [2] for more details). 
In Arsigny et al. (2005, 2006), the authors compare these three geometries when used in various 
procedures applied to the tensor field such as interpolation and filtering. It is shown there that, 
despite its simplicity, the Euclidean geometry has several drawbacks, in particular the swelling 
effect, which refers to the phenomenon of the interpolated tensor having a larger determinant than 
those of both the original tensors. In the context of DT-MRI, the determinant of a tensor reflects 
the degree of dispersion of water molecules, and thus the swelling effect is contradictory to physical 
principles. On the contrary, both the log-Euclidean and the affine-invariant geometries do not suffer 
from this problem. Indeed, under both geometries, tensor interpolation leads to the interpolation of 
the determinants (Arsigny et al, 2005). Moreover, Arsigny et al. (2005, 2006) show that, the log- 
Euclidean and the affine-invariant geometries give similar results for various tensor computations, 
with the former being computationally easier. 

Along with spatial smoothness, tensor fields often show certain degrees of anisotropy, e.g., due 
to faster water diffusion along white matter fiber tracts in context of DT-MRI. Conventional kernel 
smoothing methods use spherical neighborhoods and perform same amount of smoothing along 
every direction (referred to as the isotropic smoothing). On the contrary, anisotropic smoothing 
methods impose relatively larger amount of smoothing along the directions that show a greater 
degree of anisotropy (such as along the fiber tracts) . Thus such methods better preserve structures 
of the tensor field compared to isotropic smoothing. This is particularly desirable for DT-MRI 
since smoothed tensors and fractional anisotropics computed from them are often used as inputs for 
tractorgraphy. Anisotropic smoothing has been considered in Chung et al. (2003, 2005), Tabelow 
et al. (2008). 

In this paper, we conduct both simulation studies (Section[3|) and theoretical analysis (Section[4|) 
to explore the impacts of geometry on tensor smoothing. We find that, the (relative) performances 
of smoothing methods under different geometries mainly depend on two factors: (i) the noise 
structure; (ii) the structure of the tensor field. With additive or approximately additive noise, 
Euclidean smoothing perform comparably (under small noise levels) or better than (under moderate 
to large noise levels) the log-Euclidean smoothing and the affine-invariant smoothing (hereafter, 
referred to as geometric smoothing). This is partly due to the fact that both geometric smoothing 
methods involve taking logarithm of the tensors which amplifies additive noise. Here, "additive 
noise" refers to the noise structure where the expectation of the noise-corrupted tensor equals to 
the underlying noiseless tensor (see Remark |4] in Section [4]). On the other hand, when the noise 
structure is highly non-additive, geometric smoothing tend to outperform Euclidean smoothing. In 
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terms of tensor field structures, Euclidean smoothing works comparably to geometric smoothing 
in homogeneous regions, especially those with predominantly (nearly) isotropic tensors, while the 
geometric smoothing works better in regions with relatively high degrees of heterogeneity. The 
perturbation analysis conducted in Section [J] provides theoretical justifications for some findings of 
the simulation studies. Apart from highlighting the difference between the Euclidean and geometric 
smoothing, the perturbation analysis results also indicate that when the tensors are nearly isotropic, 
the two geometric smoothing methods are similar, which is also pointed out by Arsigny et al. (2005, 
2006). The simulation results also point to the need of a multi-scale approach, i.e., using adaptively 
chosen bandwidths for different regions according to their degrees of homogeneity of the tensors. For 
example, in the presence of crossing fibers, or near the edge of the fiber bundles (which are regions 
of high degrees of heterogeneity), tensor smoothing, especially under the Euclidean geometry, could 
lead to even worse results compared to the (un-smoothed) observed noisy tensors unless very small 
bandwidths are used. Whereas, in highly homogenous regions, relatively larger bandwidths often 
lead to better results. A multi-scale approach for tensor smoothing is proposed and studied in 
Polzehl and Spokoiny (2006) and Tabelow et al. (2008). Finally, the simulation studies show that 
in presence of highly anisotropic tensors, anisotropic smoothing usually improves over isotropic 
smoothing. 

In this paper, we also conduct numerical (Section[3j) and theoretical (Section[5]) studies under the 
Rician noise model to compare linear and nonlinear regression methods for extracting tensors from 
gradient-based diffusion weighted imaging data. These studies show that, the nonlinear regression 
method improves over the linear regression method, since the former has lower variability when 
the signal-to-noise ratio (SNR) is large. Indeed, when SNR is large , the efficiency of the nonlinear 
regression estimator is comparable to that of the maximum likelihood estimator. Zhu et al. (2009) 
also carry out asymptotic analysis of regression estimators under the Rician noise model, although 
it is under a different asymptotic framework. 

The rest of the paper is organized as follows. In Section 2, we discuss kernel smoothing on a 
tensor space. In Section 3, we present results from simulation studies and discuss their implications. 
In Section 4, a perturbation analysis is conducted to compare the means of a set of tensors under 
three different geometries on the tensor space. In Section 5, asymptotic analysis is conducted to 
study and compare the regression estimators under the Rician noise model. Technical details are 
given in the appendix. 

2 Kernel Smoothing on Tensor Space 

In this section, we extend kernel smoothing to a smooth manifold. In particular, we consider 
smoothing on the space of N x N positive definite matrices (hereafter, referred to as the tensor 
space, and denoted by Vn)- Consider a function / : V C M. d — > W. Suppose that we observe 
pairs {(si, Xi)Jf =1 with Sj 6 T> being the design points, and Xj £ W being noise-corrupted versions 
of f(si). The goal is to reconstruct the function / based on such observations. One way to 
fit /(•) at location s 6 P is to locally approximate /(•) by a polynomial and then use suitably 
weighted data to determine the coefficients of this polynomial. In statistics, this is called local 
polynomial smoothing (Fan and Gijbel, 1996). The simplest form is to consider a constant function 
approximation. Specifically, for s £T> , 
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where || • ||a denotes a Euclidean norm, and u>i(s) > are nonnegative weights. A common scheme 
for the weights is 

Ui(s) := K (^j^), i = l,---,n, (2) 

where K (•) is a nonnegative, smooth kernel on M. d , and h > is the bandwidth. Thus this method 
is also called kernel smoothing. Note that, || Xi — c H2, denoted by d(Xi, c), is simply the Euclidean 
distance between X{ and c. Let (A4,g) denote a smooth manifold A4 equipped with a metric g (i.e., 
a Riemannian manifold). Also denote the corresponding geodesic distance by d^i'-, •)■ Then kernel 
smoothing can be immediately generalized to (M,g) by using c£m(-, •) in place of d(-, •) in definition 
(HJ). Specifically, for a function / : V C M ' — >• (A4,g), and the observed data pairs {(si, Xi)}™ =1 
where Si € T> and Xj € .M are noise-corrupted versions of f(st), the kernel smoothing of /(•) is 

n 

/(a) := argminVwi^^^), s G Z>, (3) 
Yg.M *■ — ' 

i=l 

which is in the form of a weighted Karcher mean (Karcher, 1977) of {X±, ■ ■ ■ ,X n }. Note that, 
different geometries may be imposed onto a smooth manifold A4, which in turn result in different 
geodesic distances on M. Since the set of N x N positive definite matrices is the interior of a cone 
(consisting of all positive semi-definite N x N matrices) in the Euclidean space ~K NxN , Euclidean 
geometries can be naturally imposed onto the tensor space and the corresponding distances are 
referred to as the Euclidean distances. One such example is dE(X,Y) := {tr(X — Y) 2 } 1 / 2 which 
corresponds to the trace norm on W NxN . Under Euclidean distances, ([3]) can be easily solved by a 
weighted average 

n n 

f E (s) = ^uj i (s)X i /j2ui(s), (4) 
i=l i=i 

where the addition and scalar multiplication are the usual matrix addition, and scalar multiplica- 
tion, respectively. As an alternative to Euclidean geometries, Arsigny et al. (2005, 2006) propose 
logarithmic Euclidean (henceforth log-Euclidean) geometries on the tensor space. Firstly, a Lie 
group structure is added to the tensor space through defining a logarithmic multiplication 

XQY :=exp(log(X)+log(Y)), X,YeV N . 

Then it is shown that, the bi-invariant metrics on this tensor Lie group exist and lead to distances 
of the form: 

d LE (X,Y) =|| log X- logy 11, (5) 

where || • || is a Euclidean norm. Note that, the distance defined in ([5]) is similarity-invariant if the 
trace norm is used. Under these log-Euclidean distances, ([3]) can also be explicitly solved as 




/le(s) = exp ^2u)i(s)log(Xi) /^2u)i(s) , (6) 



i=i 



i.e., the matrix exponential of the weighted average of the logarithm of the tensors. 

It is also well-known that Vn can be identified with the quotient space GL + (N, M)/SO(N, ] 
which is defined by the conjugate group action 9 of GL + (N, M) on Vn- 

,X):=gXg T , for g G GL + (N,R) and X G V N . 
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Here, GL + (N, K) is the identity component of the general linear group GL(N, K) - the Lie (sub)group 
consisting of N x N matrices with positive determinant; and SO(N,M) is the special orthogonal 
group - the Lie (sub)group consisting of N x N orthogonal matrices with determinant one. The 
above quotient space (also referred to as Vn) is a naturally reductive homogenous space (Absil et 
al, 2008) and its bi-invariant metric is given by: for X E Vn an d S,T £ Tx(Vn) ~ the tangent 
space of Vn at X, 

(S,T) X :=tr (SA^TX" 1 ) . 
Under this metric, the geodesic distance between two points X,Y 6 Vn is 



d Aff (X,Y) :-- 



1/2 



tr (logiX-^YX- 1 ' 2 )) = ((log x (y),log x (Y))x) 1/2 » ( 7 ) 



where \og x {Y) := X 1 / 2 \og(X 1 ' 2 YX 1 I 2 )X V I 2 is the logarithm map on this space. Since this 
metric is affine invariant, i.e., for any g £ GL + (N, R) 

d Aff (9(g,X),e(g,Y)) = d Aff (X,Y), 

hereafter, it is referred to as the affine-invariant metric. 

The affine-invariant geometry on Vn has been extensively studied. For example, by Forstner 
and Moonen (1999), Fletcher and Joshi (2004, 2007). Particularly, in Pennec et al. (2006), a 
Riemannian framework for tensor computations is proposed under the affine invariant geometry 
and is applied to DT-MRI studies. Since Vn has non-positive curvature (Skovgaard, 1984), as 
pointed out by Pennec et al. (2006), there exists a unique solution to ([3]) under the affine-invariant 
metric ([7]) if all weights are positive. An intrinsic gradient descent algorithm is proposed in Pennec 
et al. (2006) for solving the weighted Karcher mean problem on this space. Alternatively, in 
Ferreira et al. (2006), and Fletcher and Joshi (2007), intrinsic Newton- Raphson algorithms are 
derived to find the Karcher mean, which can also be easily adapted to solve ([3]). In this paper, 
we propose a simple iterative method which does not involve any (intrinsic) gradient or Hessian 
computations. This method is based on the following observation: for a sequence of points z\, . . . , z n 
in the Euclidean space W, one can compute their weighted mean to := (Y^i=i w i z i) / (Y^7=i w i) f° r 
w±, . . . , w n > through an ra-step recursive procedure: 

(i) Set mi = z\ , and j = 1; 



TO.; 



(ii) Compute ?rij+i = rrij + r^ 1 (zj +1 ,,* 3h 

(iii) If j + 1 < n, set j = j + 1 and go to (ii). Otherwise, set to, = m n and stop. 

In step (ii), to j+ i is on the line segment connecting m,- and Zj+x- Note that straight lines are 
geodesies in the Euclidean space. Thus, as a generalization to a Riemannian manifold (A4,g), this 
step can by modified as follows: 

(ii) Compute to j+ i = 7(1 — ^jffi ), where 7(-) is the geodesic on (M.,g) such that 7(0) = mj 
and 7(1) = Zj+i. 

Note that, in general, the mean value computed this way depends on the ordering of the points 
Zi,...,z n unless z^s all lie on a geodesic in (Ai, g). When solving ([3]), we propose to pre-order 
AVs based on the Euclidean distances of Sj's from s, i.e., z\ := X\, where s± is closest to s among 
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all Sj's, etc. For a manifold (M,g), the geodesic 7(-) between two (nearby) points X, Y such that 
7(0) = X and 7(1) = Y can be expressed through the exponential map and the logarithm map: 
7(£) = exp x (tlog x (Y)). For the tensor space with amne-invariant metric, we have: for X, Y € Vm 
and S £ Tx(Vn) 

ex Px (S) = X 1 ' 2 exp(X- 1 /2 1 sx- 1 /2)x 1 /2 ! logx(y) = X V2 log^" 1 / 2 ^- 1 / 2 )^. ( 8 ) 

Compared with the gradient-based methods, the above algorithm has the advantages of being 
numerically stable and computationally efficient. In practice, it leads to similar results when solving 
([3]) as those obtained by the gradient-based methods (results not shown) . 

In this paper, we refer to the smoothing methods using the metrics defined in © or (J7J) as 
geometric smoothing, whereas the smoothing methods using the Euclidean metrics as Euclidean 
smoothing. The two geometric metrics have some appealing properties such as invariance. Moreover, 
under both log-Euclidean metric and affine-invariant metric, interpolation of tensors results in an 
interpolation of their determinants (Arsigny et al, 2005). Thus, unlike the Euclidean smoothing, 
geometric smoothing methods do not suffer from swelling effects. However, as we shall see in 
Sections [3] and [H the relative merits of these metrics in terms of tensor smoothing are less obvious 
and they rely on several factors (and their interactions), especially, the noise structure and the 
structure of the tensor field being smoothed. It turns out that, Euclidean smoothing perform 
comparably or even better than geometric smoothing, under nearly additive noise and/or regions 
dominated by (nearly) isotropic tensors. The perturbation analysis performed in Section 0] shows 
that, the logarithm operation on the tensors (which is used for both geometric smoothing methods, 
but not in Euclidean smoothing) amplifies additive noise, and consequently results in a bias. On the 
other hand, if the noise structure is highly non-additive and/or the regions are with heterogeneous 
tensors (such as at the crossings of fiber bundles or on the boundary of fiber bundles), geometric 
smoothing tend to outperform Euclidean smoothing, presumably as a benefit of respecting the 
intrinsic geometry of the tensor space. 

Besides the choice of metrics, one also needs to choose a scheme to assign weights u>i(s)'s in ([3]). 
The conventional approach is to simply set weights as in ([2]) where K is a fixed kernel, e.g., the 
Gaussian kernel k(t) = exp(— 1 2 /2). This is referred to as isotropic smoothing. However, the tensor 
field often shows various degrees of anisotropy in different regions and the tensors tend to be more 
homogeneous along the leading anisotropic directions (e.g., along the fiber tracts). Thus it makes 
sense to set the weights larger if Sj — s is along the leading diffusion direction at s. Therefore, we 
propose the following anisotropic weighting scheme: 



Ui(s) := K h tytr(D)(8i - s) T D~^( Si - s)), (9) 

where D is the current estimate of the tensor at voxel location s, and Kh(-) '■= K(-/h) is a 
nonnegative, integrable kernel with h > being the bandwidth. The use of ti{D) in Q is to 
set the weights scale- free with regard to D. There are other schemes for anisotropic weights. For 
example, in Tabelow et al. (2008), the term tr(D) is replaced by det(-D) in ([9]), which is supposed 
to capture not only the directionality of the local tensor field, but also the degree of anisotropy. 
Chung et al. (2003, 2005) also propose kernel smoothing under Euclidean geometry with anisotropic 
kernel weights. Simulation studies in Section [3] show that, as expected, anisotropic smoothing 
improves over isotropic smoothing in regions with highly anisotropic tensors, and the two perform 
similarly in regions with predominantly isotropic tensors. Moreover, for both isotropic weights ([2]) 
and anisotropic weights ([9]), the bandwidth h needs to be specified. The bandwidth h controls the 
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amount of smoothing: the larger h is, the more smoothing is performed. The choice of bandwidth is 
an active field of research in kernel smoothing. In principle, in relatively more homogeneous regions, 
a larger h should be used, while in relatively less homogeneous regions, a smaller h should be used. 
Since the tensor fields often show various degrees of homogeneity across different regions, i.e., in 
some regions, the neighboring tensors are more alike, while in some other regions, the neighboring 
tensors are more different, bandwidth should be adaptively chosen according to the local degree 
of homogeneity. This idea has been studied in Polzehl and Spokoiny (2006), Polzehl and Tabelow 
(2008) and Tabelow et al. (2008), where a structural adaptive smoothing is proposed. In this 
paper, we do not try to give a general prescription of bandwidth selection for tensor smoothing. 
Rather, we simply show by simulation studies that the smoothing methods work best under different 
bandwidth in different regions. 

3 Simulation Studies 

In this section, we conduct systematic simulation studies to explore the impacts of geometry on 
tensor smoothing. We also compare isotropic and anisotropic smoothing, as well as examine tensor 
smoothing across a set of bandwidth choices. We first describe the simulation design, in particular, 
the structure of the simulated tensor field and the noise models. 

3.1 Simulated tensor field 

The simulated tensor field consists of 4 slices (in the z-direction) each of dimension 128 x 128 (in the 
x- and y-directions). Thus, altogether there are 128 x 128 x 4 voxels. The tensor field consists of two 
types of regions: the background regions with identical isotropic tensors - the 3x3 identity matrices 
I3 = diag(l, 1, 1); and the bands regions with anisotropic tensors: on each band, all the tensors are 
the same and they point to either the vertical (i.e., y-direction) or the horizontal direction (i.e., 
x-direction) . More specifically, each slice has three parallel vertical bands with tensors all point to 
the vertical direction and three horizonal bands with tensors all point to the horizontal direction. 
These bands are of various widths and the tensors on them also show various degrees of anisotropy. 
Whenever a horizontal band and a vertical band cross each other, the tensors at the crossing voxels 
are set as the ones on the corresponding horizontal band. See schematic plot Figure [1] for an 
illustration. Moreover, the bands structures for slices 1 and 2 are the same, and those of slices 3 
and 4 are the same. The location and width of the bands and the tensors on each band are given 
in Table 1. 

3.2 Noise models 

After the tensors (referred to as the true tensors) are simulated according to the previous section, 
we add noise to them to derive noise corrupted tensors (referred to as the observed tensors). We 
consider two different noise structures. The first noise model is motivated by the process of data 
acquisition in DT-MRI studies from diffusion weighted imaging data. The second noise model is mo- 
tivated by the consideration of decoupling the noise variability in the eigenvalues and eigenvectors. 
We describe both models in detail in the following. 
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Figure 1: Schematic plot of one slice of the simulated tensor field. 



Rician noise 

In this model, we first generate the noiseless diffusion-weighted multi-angular images (DWI) data 
via model (|10p (Mori, 2007) using a set of 9 gradient directions, i.e., for a voxel with true diffusion 
tensor D, the diffusion weighted signal intensity in direction b k (a 3 x 1 vector of unit norm) is 
modeled as 

S k = S exp{-blDb k ), (10) 

where So, or the baseline signal strength (without diffusion weighting), is set to be 10 in our simu- 
lation. These 9 gradient directions come from a real DT-MRI study, and they are the normalized 
versions of the following vectors: 

(1,0,1), (1,1,0), (0,1,1), (0.3,0.2,0.1), (0.9,0.45,0.2), (1,0,0), (0,1,0), (0,0,1), (2,1,1.3). 

In our simulation, each of the 9 gradient directions is repeated twice which results in 18 DWI 
measurements per tensor. (This is the design used in the aforementioned real study). These 
DWI intensities S k are then corrupted by Rician noise (see equation (|24p in Section [5]), which 
gives rise to noise corrupted DWI's S^s (referred to as the observed DWI's) for each tensor. We 
consider three different values for the noise parameter a in equation (|24p . namely, a = 0.1,0.5,1 
which corresponds to low, moderate and high noise levels, respectively. Finally, at each voxel, 
a regression procedure is applied to the observed DWI data to derive the observed tensor. Two 
different regression procedures are considered (i) linear regression of {log(S'/ c )} on (see equation 
(|25|) ): (ii) nonlinear regression of {Sk} on {b k } (see equation (|26|) ). Note that, unlike smoothing 
methods, for each voxel, regression procedures only use DWI data from that voxel to derive the 
corresponding tensor and no information from neighboring voxels is used. See Section [5] for more 
details on the regression procedures. 

The analysis carried out in Section [5] also shows that, under the Rician noise model, at relatively 
high signal-to-noise ratios, the observed DWI intensities as well as the tensors derived from them 
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Table 1: Description of the simulated tensor field 



Band orientation 


Slice # 


Band # 


Indices for band boundaries 


Tensor on band 


Horizontal 


1 and 2 


1 


20, 35 


diag(0.25,16,0.25) 






2 


60, 75 


diag(0.5,4,0.5) 






3 


90, 105 


diag(0.7,2,0.7) 




3 and 4 


1 


40, 50 


diag(0.25, 16,0.25) 






2 


80, 90 


diag(0.5,4,0.5) 






3 


110, 120 


diag(0.7,2,0.7) 


Vertical 


1 and 2 


1 


20, 35 


diag(16, 0.25, 0.25) 






2 


60, 75 


diag(4,0.5,0.5) 






3 


90, 105 


diag(2,0.7,0.7) 




3 and 4 


1 


40, 50 


diag(0.25, 16,0.25) 






2 


80, 90 


diag(0.5,4,0.5) 






3 


110, 120 


diag(0.7,2,0.7) 



by regression methods are approximately unbiased, i.e. E(iSfc) ~ S)~, and K(D) ~ D. A high 
signal-to- noise ratio results from either a small noise level for each intensity measurement (i.e., a 
small a), or a large number of gradient directions. This implies that, under such cases, we can view 
Rician noise for tensors as an approximately additive noise (See also Remark [8] in Section [5]). 

Spectral noise 

In this model, tensors are corrupted directly by adding random perturbations to their eigenvalues 
and eigenvectors which leads to a highly non-additive noise structure. Specifically, consider the 
spectral decomposition of the tensor D 

D = EAE T , 

where A is a diagonal matrix with positive non- increasing diagonal elements (i.e., eigenvalues of 
D), and E is the corresponding orthogonal matrix consisting of eigenvectors of D. Then, the noise 
corrupted tensor D is derived by 

D := EAE T . 

In the above, A is a diagonal matrix, with Ajj = AjjUj (j = 1,2,3) where Uj's are i.i.d. x? l/ \/ l/ - 
Here v is a positive integer and ")S V \ denotes the Chi-square distribution with v degrees of freedom. 

The parameter v controls the degree of fluctuation of A around A. Indeed, E(A) = A, and larger v 
implies smaller fluctuations. Moreover, 

E := (I 3 + r,Z)[{h + vZf(h + ^Y^E 

where Z is a 3 x 3 matrix with i.i.d. N(0, 1) entries, which are independent of {Uj}^ =1 , and ry > 0. 
Note that, (^3 + rjZ)[(Is + r/Z) 7 (I3 + r/Z)] -1 / 2 is a random orthogonal matrix. The parameter 77 
controls the degree of this matrix deviating from the identify matrix I3 and larger 77 implies more 
fluctuations in E. In the simulation, we consider the following values for the noise parameters 
(i/,77): (50,0.1), (50,0.2) (50,0.3), (20,0.1), (20,0.2) and (20,0.3). 
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3.3 Results of tensor smoothing 

In this section, we report the estimation errors in terms of the affine-invariant distance ([7]) between 
the true and smoothed tensors at each voxel. We observe that these errors across the tensor field 
have a right-skewed distribution (more skewed for the geometric smoothers). Thus, measures like 
mean and standard deviation are not very representative of the error distribution. Therefore, the 
median and median absolute deviation about median (abbreviated as MAD) of the errors are used 
as performance measures. Median quantifies the typical error across tensors and MAD measures 
the robustness of the smoothers. Under the Rician noise model, for geometric smoothing, the MAD 
of the errors is somewhat larger than that of Euclidean smoothing, especially at higher noise levels, 
indicating that geometric smoothing is more sensitive to nearly additive noise. The situation is 
reversed for the spectral noise model. 

For a clearer comparison of different smoothing methods, the tensor fields are divided into 
different regions and the median errors within each region are reported for different smoothers and 
different bandwidth choices. Six regions are considered and they are illustrated in the schematic plot 
Figure[2j Specifically, "whole set" refers to the whole tensor field; "bands crossing" refers to crossing 
of two bands and the boundary of a band and the background (tensors circles by black line in Figure 
[2]); "background interior" refers to regions within the background which are at least four tensors 
away from any band (dark blue tensors) ; "bands interior" refers to regions within a band which are 
at least four tensors away from the background (red and green tensors); "background boundary" 
refers to regions within the background which are within four tensors away of a band (light blue 
tensors) ; and "bands boundary" refers to regions on bands which are within four tensors away from 
the background (orange and light green tensors). As mentioned earlier, the background consists 
of isotropic tensors and the bands consist of anisotropic tensors. Moreover, both "background 
interior" and "bands interior" are relatively homogeneous regions (i.e., the neighboring tensors are 
alike), whereas "background boundary", "bands boundary" are relatively heterogeneous regions 
(i.e., neighboring tensors could be very different), and "bands crossing" are highly heterogeneous 
regions. 

Results under Rician noise model 

Three smoothers corresponding to three geometries on the tensor space are considered, namely (a) 
Euclidean smoothing; (b) Log-Euclidean smoothing; and (c) affine-invariant smoothing. Tensors 
from both linear regression of observed DWI's (referred to as linear input) and nonlinear regression 
of observed DWI's (referred to as nonlinear input) are used as inputs for tensor smoothing. We 
consider both the isotropic smoothing scheme and the anisotropic smoothing scheme. In the latter 
case, the smoothing is done in two stages. In the first stage, isotropic smoothing is performed to get a 
preliminary estimate Di SO (s) at each voxel s and in the second stage, anisotropic weights are derived 
by setting D = Di so (s) in ([9]) and then smoothing is performed using the isotropically smoothed 
tensors Dj so (s)'s and these weights. Three different bandwidth choices are considered for isotropic 
smoothing: h = 0.005, 0.01 and 0.025. For anisotropic smoothing, the bandwidth pairs being 
considered are (hi SO , h an i so ) = (0.005,0.01), (0.01,0.01) and (0.01,0.025), where hi S0 and h an i so , 
denote the bandwidths used in the first stage and the second stage, respectively. We use a Gaussian 
kernel and truncate the weights that fall below a small threshold (10 -6 ) to speed up computation. 
We then re-scale the weights such that their summation equals to one. The set of neighboring voxels 
that receive positive weights is defined as the neighborhood. The neighborhood does not vary across 
tensors for isotropic smoothing at each given bandwidth, but it is varying for anisotropic smoothing. 
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Figure 2: Schematic plot of different regions for the simulation design. 



In Table 2, the size of each neighborhood, the number of voxels which account for (cumulatively) 
99% of the total weights (in parentheses), as well as summary statistics of the weights distribution 
such as minimum, median, maximum and entropy (defined as — Y2i u i 1°S u i) f° r isotropic smoothing 
are reported. Note that, as bandwidth increases, the entropy of the weights also increases. Also 
note that, h = 0.005 results in essentially no smoothing (since > 99% of the weights are on the 
voxel under smoothing itself). 

Table 2: Neighborhood size and summary statistics of the weights distribution for isotropic smooth- 
ing. 



h 


Size 


min(w) median(a;) max(w) Entropy(w) 


0.005 
0.01 
0.025 


5(1) 
23 (9) 
147 (113) 


0.000881 0.000881 0.996477 0.0283 
0.000002 0.000487 0.551461 1.5140 
0.000061 0.002371 0.071480 4.0034 



We also conduct simulations where the nine gradient directions are used only once in the DWI 
data generation. Table 3 reports the median and MAD of the estimation errors for linear and 
nonlinear regression procedures across different regions. It shows that (i) nonlinear regression of the 
observed DWI data gives more accurate estimate of the tensors compared to the linear regression. 
The improvement is only slight on the background (i.e., the isotropic regions). However, the 
improvement is significant on the bands (i.e., the anisotropic regions); (ii) comparison under the 
two simulation schemes (9 gradients each used once and twice, respectively) shows that the results 
for both linear and nonlinear regression improve when the number of gradient directions increases; 
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(iii) results across different noise levels show that the performance of both linear and nonlinear 
regression degrades when the noise level increases. 

Table 3: Comparison between linear regression and nonlinear regression on DWI data. Reported 
numbers are the medians of the error norms (in parenthesis, MAD of error norms) in three regions: 
whole set, bands and background. 









gradient directions 


each used once 




Noise level 


Method 


Whole set 


Bands 


Background 


a = 


0.1 


linear 


0.1049 (0.0460) 


0.3207 (0.2409) 


0.0757 (0.0185) 






nonlinear 


0.0991 (0.0380) 


0.1823 (0.1177) 


0.0757 (0.0185) 


a = 


0.5 


linear 
nonlinear 


0.5462 (0.2523) 
0.5141 (0.2148) 


1.6672 ( 1.3299) 
1.0617 (0.7383) 


0.3850 (0.0982) 
0.3829 (0.0970) 


a = 


= 1 


linear 


1.2382 (0.6738) 


10.9 ( 5.844) 


0.819 (0.2592) 






nonlinear 


1.1318 (0.5615) 


2.8713 (2.3365) 


0.8009 (0.2424) 






9g 


radient directions each repeated twice 




Noise level 


Method 


Whole set 


Bands 


Background 


a = 


0.1 


linear 


0.073891 (0.0322) 


0.229592 (0.1734) 


0.053692 (0.0130) 






nonlinear 


0.069904 (0.0267) 


0.129959 (0.0844) 


0.053679 (0.0130) 


a = 


0.5 


linear 


0.383068 (0.1770) 


1.330171 (1.0666) 


0.271789 (0.0681) 






nonlinear 


0.359311 (0.1481) 


0.828572 (0.5926) 


0.269491 (0.0672) 


a = 


= 1 


linear 


0.825685 (0.4091) 


2.489265 (2.0479) 


0.566317 (0.1579) 






nonlinear 


0.758624 (0.3443) 


1.726173 (1.2410) 


0.548341 (0.1484) 



The median errors for different smoothers across different bandwidths within each of the six 
regions (illustrated by Figure 2) are plotted in Figures 3 to 8. In these figures, the x axis de- 
notes bandwidths, and the median errors for different smoothers and the observed tensors are 
represented by different symbols, with circle corresponding to the observed tensor (either derived 
by linear regression or by nonlinear regression), triangle corresponding to Euclidean smoothing, 
+ corresponding to log-Euclidean smoothing, and x corresponding to affine-invariant smoothing. 
The main findings are summarized below. 

1. Tensor smoothing generally improves upon the corresponding regression results (either linear 
or nonlinear) except for the "bands crossing" regions which are highly heterogenous. 

2. At small noise levels (<r = 0.1), the three smoothers - Euclidean, log-Euclidean and affine- 
invariant - perform comparably, with geometric methods working slightly better. 

3. At moderate to large noise levels (a = 0.5 or 1), the Euclidean smoother works better or even 
considerably better (e.g. when a = 1) than the two geometric smoothers. 

4. The log-Euclidean smoother appears to be more sensitive to noise than the affine-invariant 
smoother indicated by a better performance of the latter under large noise levels {a = 1). 
Overall, the qualitative performances of these two methods are comparable. 

5. Euclidean smoothing results in prominent swelling effects near the crossing of two bands. 

6. At small noise levels (a = 0.1), geometric smoothing methods perform better than Euclidean 
smoothing in the regions with high degrees of heterogeneity, i.e., the "bands crossing" regions. 
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7. At comparable bandwidths, anisotropic smoothing often improves over isotropic smoothing. 
For example, anisotropic with (0.01, 0.01) performs better than isotropic with h = 0.01 except 
for on "bands crossing"; and anisotropic with (0.01, 0.025) performs better than isotropic with 
h = 0.025 at small noise levels (except for on "background boundary"). 

8. In terms of bandwidths choices, in more homogeneous regions, relatively larger bandwidths 
are preferred, whereas in more heterogeneous regions, smaller bandwidths are preferred. 
Also, larger bandwidths are preferred under higher noise levels. More specifically, on "bands 
crossing" (highly heterogeneous), for small noise levels (a = 0.1), isotropic smoothing with 
h = 0.005 is the best which indeed corresponds to nearly no smoothing, whereas, for larger 
noise levels (a = 0.5, 1), isotropic smoothing with h = 0.01 is the best. On "background 
interior" (highly homogeneous and isotropic), anisotropic smoothing with (0.01,0.025) and 
isotropic smoothing with h = 0.025 work comparably and better than other bandwidth 
choices. On "bands interior" (relatively homogenous and highly anisotropic), anisotropic 
smoothing with (0.01,0.01) or (0.01,0.025) is the best under small noise levels, and isotropic 
smoothing with h = 0.025 is the best under larger noise levels. On "bands boundary" (rela- 
tively heterogeneous), anisotropic smoothing with (0.005,0.01) is the best under small noise 
levels, whereas isotropic smoothing with h = 0.025 is the best under larger noise levels. On 
"background boundary" (relatively heterogeneous), under small noise levels, isotropic smooth- 
ing with h = 0.005 or 0.01 and anisotropic smoothing with (0.005, 0.01) or (0.01, 0.01) all work 
comparably and are better than other bandwidths choices, whereas under larger noise levels, 
anisotropic smoothing with (0.01,0.01) is the best. 

Results under spectral noise 

The three smoothers corresponding to Euclidean, Log-Euclidean, and affine-invariant metrics are 
applied to the noise corrupted tensors. Four different bandwidth choices for isotropic smoothing: 
h = 0.005, 0.01, 0.025 and 0.035 are considered. For anisotropic smoothing, the bandwidth pairs 
being considered are (hi SO , h an i so ) = (0.005,0.01) and (0.01,0.025). 

The median errors are plotted in Figures 9 to 14. The main findings are summarized below. 

1. At all noise levels, the geometric smoothers outperform the Euclidean smoother across all 
regions except for "background interior" (highly homogeneous and isotropic) where Euclidean 
smoother works slightly better under larger noise levels [y = 20). 

2. Log-Euclidean smoother performs slightly better than the affine-invariant smoother, even 
though their qualitative performances are comparable. 

3. Euclidean smoother results in prominent swelling effects near the crossing of two bands. 

4. Geometric smoothing are more advantageous in the regions that are more heterogeneous. 
This can be seen by comparing the results on "background boundary" with "background 
interior", and "bands boundary" with "bands interior". 

5. Anisotropic smoothing performs better than isotropic smoothing (at a comparable bandwidth 
and especially under smaller noise levels) in the regions with higher degrees of anisotropy. 
This is prominent in the regions "bands boundary" and "bands interior" (which together form 
the bands). 
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6. In terms of bandwidths choices, in more homogeneous regions, relatively larger bandwidths 
are preferred, whereas in more heterogeneous regions, smaller bandwidths are preferred. 
Also, larger bandwidths are preferred under higher noise levels. More specifically, on "bands 
crossing" (highly heterogeneous), anisotropic with (0.005,0.01) and isotropic smoothing with 
h = 0.01 work best; on "background interior" (highly homogeneous and isotropic), anisotropic 
smoothing with (0.01,0.025), and isotropic smoothing with h = 0.025,0.035 work comparably 
and better than other bandwidths choices. On both "bands interior" (relatively homoge- 
nous and highly anisotropic), and "bands boundary" (relatively heterogeneous and highly 
anisotropic), anisotropic smoothing with (0.01, 0.025) works best. On "background bound- 
ary" (relatively heterogeneous), under smaller noise levels (y = 50), anisotropic smoothing 
with (0.005,0.01) and (0.01,0.025), and isotropic smoothing with h = 0.01 work comparably 
and better than other bandwidths, whereas, under larger noise levels {y = 20), anisotropic 
smoothing with (0.01,0.025) works best. 

7. As rj increases (corresponding to more variable eigenvectors), for any given u, the performances 
of all smoothers across all regions degrade, except for the "background interior" regions 
(highly homogeneous and isotropic). 

4 Perturbation analysis of smoothing methods 

In this section, we conduct a perturbation analysis to compare the means of a set of tensors with 
respect to the Euclidean, log-Euclidean and Affine invariant metrics. Arsigny et al. (2005) also 
carry out a theoretical analysis studying and comparing log-Euclidean and affine-invariant metrics 
in tensor computation. They derive an asymptotic expression of the difference between the log- 
Euclidean mean and the affine-invariant mean when the true tensors lie in a small neighborhood of 
the identity matrix. They also explore situations where these two means are equal. 

We carry out a perturbation analysis by adopting a framework similar to that in Arsigny et 
al. (2005). However, we focus mainly on the quantitative aspects of the differences between these 
means. In particular, we derive asymptotic expansions for the difference between the Euclidean 
and log-Euclidean means, and that between the Euclidean and affine-invariant means. We derive 
these results for general tensors, i.e., without requiring them to be nearly isotropic. These results 
also highlight the distinction between the situations when the noise structure is additive (meaning 
that the expectation of the tensors equals to the underlying noiseless tensor) and when the noise 
structure is non-additive. 

Suppose that we observe random tensors {S(u) £ Vn '■ w € f2}, where O is an arbitrary index 
set with a Borel cr-algebra. Let P be a probability measure on Q. 

Let S denote the expectation of S(u) with respect to P, i.e., 



where the integration is defined through (ordinary) matrix addition of the tensors. Let B(oj) := 
S(cj) — S. Then E(2?) = 0, where denotes the N x N zero matrix. 

Observe that, the expectation of S(u) coincides with the weighted Karcher mean of S(ui) with 
respect to the Euclidean metric, i.e., 




(11) 



S = E(5) = arg min 
KeV N 
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where d{-, •) is a Euclidean distance. Next, let Sle denote the mean of S(uj) with respect to the 
log-Euclidean metric, i.e., 

S LE :=argmm [ d 2 LE (K, S(u))dP(u). (12) 

K£Vn J 

It is easy to show that, 

logS LE = E(log,S) = J \ogS{u)dP{u). 
Finally, let §Aff denote the affine-invariant mean of S(uS), i.e., 

S Aff := arg min / d\ ff (K, S(u>))dP(u), (13) 

In general, there is no closed form expression for S^ff, but S A ff satisfies the following barrycentric 
equation (Arsigny et ai, 2005): 

E(log(^}/ 2 55^f)) = 0. (14) 

In the following, we adopt a matrix perturbation analysis approach to study the differences 
among means under different metrics. Let \j denote the j-th largest distinct eigenvalue of S and 
let Pj denote the corresponding eigen-projection. We make a distinction between the cases when 
S is a multiple of the identity (i.e., S is isotropic) and when otherwise (i.e., S is anisotropic). Let 

| maxjmaxj A" 1 , max/^j \Xf. — Xj if S ^ X\I 

'~ [K 1 ifS = XiI 

where k ^ j corresponding to pairs of distinct eigenvalues of S and / denotes the N x N identity 
matrix. We also assume that the probability measure P satisfies 

P(sup || B(u) \\< C~H) = 1, (15) 

for some t > 0, where || • || denotes the operator norm. Note that, t can be viewed as a scale 
parameter indicating the spread of S^u^'s. A small value of t means that the tensors are more 
homogeneous. 

In the case that S is not a multiple of the identity (i.e., S is anisotropic), define Hj to be the 
matrix 

Br-^j^P, do) 

Note that HjPj = PjHj = for all j. For simplicity of exposition, in the following, we consider 
only two cases: (a) when the eigenvalues of S are all distinct; and (b) when S is isotropic, i.e., all 
its eigenvalues are equal to Ai- 

Comparison of Euclidean and log-Euclidean means 

Proposition 1 Suppose that the tensors {S(uj) : oj G £1} and the probability distribution P satisfy 

m 
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(a) If the eigenvalues of S = E(5) are all distinct, then 



(b) If S = Ail, then 



log S LE - log S 

N N 

- pE[tr(P jBHjB)]Pj - - £ ^ntr{P j B)fP j 

j=l 3 j=l j 

N 

+ l °S AjE [PjBHjBHj + HjBPjBHj + HjBHjBPj 

-p 3 bp 3 bh) - P 3 BH]BP j - //;/,>/</;/< 

- 1 

- —EitriPjB^PjBHj + flj-SPj)] + °( i3 )- ( 17 ) 
i=i J ' 



log£ L £-logS = -^E(P 2 ) + 0(i 3 ). (18) 



Remark 1 When the eigenvalues of S are all distinct, taking trace on both sides of |i7| J, and using 
the fact that PjHj = and P 2 = Pj, we get 

N N 
tr(logS LE )-tr(logS) = -J2-E[tr(P j BH j B)]--J2^E[tr(P j B)f + 0(t 3 ). (19) 

j=i i j=i j 

Since 

tr{\ogS LE ) - tr{\ogS) = \ogdet{S LE ) - logdet(S), 



where det denotes the determinant, \19\) actually gives an expansion of the determinant of Sle 
around the determinant of S. 

Comparison of Euclidean and affine-invariant means 

Proposition 2 Suppose that the tensors {S(oj) and the probability distribution P satisfy 

(a) If the eigenvalues of S are all distinct, then 
log S A ff - log S 

N N 

j=l 3 j=l 
1 N 1 

= -^Y.Y ntr{PjBS " lB)]Pj 

3=1 'J 



1 N 

+ - ^ log X j E[P j BS- 1 BH j + HjBS~ l BPj] + 0(t 3 ). (20) 
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(b) IfS = X 1 I, then 



1 - ~* 



log 5 A// - log 5 = -^f-(B') + 0(t"). (21) 



Note that, Propositions 1 and 2 can be used together to give an asymptotic expansion for the 
difference between the log-Euclidean and affine-invariant means. 

Remark 2 When the eigenvalues of 5 are all distinct, taking trace on both sides of Ii20\) . and using 
PjHj = 0, we get 

1 N 1 

logdet(5 A// ) -logdet(S) = tr(\ogS Aff ) - tr(logS) = --^2—E[triP j B§- 1 B)] + 0(t 3 ). (22) 

3=1 3 

Remark 3 Expressions $18\) and Ii21\) show that, when 5 = Ail (i.e., isotropic), Sle and 5a// 
are the same up to the second order (in terms of the dispersion parameter t). On the other hand, 
from [11) and A20\) we can conclude that when 5 is sufficiently far from being isotropic, there is a 



second order difference between Sle and 5a//- Moreover, in these expansions, the terms involving 
log Xj tend to dominate when at least one of the eigenvalues of 5 is close to zero. Thus, we expect 
to see a large difference between the log-Euclidean and affine-invariant means when 5 is anisotropic 
and has near zero eigenvalues. This is also confirmed by simulations. 

Remark 4 When the noise is additive, i.e. 5 = E(5) = 5* where S* denotes the "true" or target 
tensor, our analysis shows that both 5a// and Sle are biased "estimators" for S* and the bias can 
be quantified by Propositions [7] and 0- 

Remark 5 Suppose that the noise is multiplicative in the following sense. Let S* = GAG T be 
the true tensor where the columns of the N x N orthogonal matrix G denote the eigenvectors 
corresponding to the eigenvalues Si > • • • > 5n > 0, which are the diagonal elements of the diagonal 
matrix A. Suppose that the observed tensors are of the form 

S{u) = G diag(5 ie Zl{uj) , ■ ■ ■ ,5 N e ZN ^)G T 

where Z\, . . . , Zn are independent random variables with Zj having uniform distribution on the 
interval [—Cjt,Cjt] for constants c\,...,cn > 0. Then, the tensors S(u) satisfy \59\) (see the proof 
of proposition 2). Moreover, the tensors commute with each other, which implies that Sle — 5a//- 
Now, using the fact that E(Zj-) = (j = 1, . . . , N), we have 

S LE = exp(G dm 5 (E(log S x + Z ± ), • • • ,E(log^ + Z N ))G T ) 
= exp(GlogAG T ) = GAG T = S*. 

However, E(e^ J ) = (e Cjt — e~ Cjt ) / (2cjt) ^ 1, which implies that 5^5*. In this case, the Euclidean 
mean will be a biased estimator for the true tensor S* , and the bias is of the order 0(t 2 ). 



5 Regression analysis of DWI data under Rician noise model 

In this section, we study the Rician noise model for diffusion weighted imaging data (Hanh et al, 
2006). Our goal is to compare two regression procedures for deriving diffusion tensors from raw 
diffusion weighted images under this noise model. 
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In DT-MRI studies, the raw data obtained by MRI scanning are complex numbers representing 
the Fourier transformation of a magnetization distribution of a tissue at a certain point in time (cf. 
Mori, 2007). If the electronic noise in the real and imaginary parts of the raw data are assumed 
to be independent Gaussian random variables (Henkelman 1985; Gudbjartsson and Patz 1995; 
Macorski 1996), then the corresponding signal intensities follow a Rician distribution. Statistical 
properties of this noise model are studied by Zhu et al. (2009), where diagnostic tools to assess the 
quality of MR images are developed. Zhu et al. (2007) propose a semiparametric model for noise 
in diffusion-weighted images, and develop a weighted least squares estimate of the tensors. They 
study the effects of noise on the estimated diffusion tensors, as well as on their eigen-structures 
and morphological classification. The Rician noise model is also analyzed by Polzehl and Tabelow 
(2008) where a weighted version of the maximum likelihood estimator of the tensors is proposed. 

In this section, we consider the Rician noise model and focus on: (i) obtaining asymptotic 
expansions of the linear and nonlinear least squares estimators in high signal-to-noise ratio (SNR) 
regimes - the results show that the linear regression estimate is less efficient; and (ii) showing 
that the nonlinear least squares estimator is asymptotically as efficient as the maximum likelihood 
estimator in high SNR regimes; (iii) quantifying the bias in the linear least squares estimator at 
low SNR. Here we assume that the gradient directions in the MR data acquisition step is fixed and 
the noise level varies. In contrast, in Zhu et al. (2007, 2009), the number of gradient directions 
increases to infinity while the noise level is kept fixed. 

5.1 Rician noise model and regression estimates 

We first describe the Rician noise model for diffusion-weighted MR images at a given voxel. Let 
B denote a set of gradient directions (3x1 vectors of unit norm). Under the model for diffusion 
tensors (cf. Mori, 2007), the noiseless diffusion weighted signal at direction 6 € B is given by 

S b = S exp(-b T Db), (23) 

where 5o is the baseline intensity, and D is a 3 x 3 positive definite matrix (the diffusion tensor). 
The Rician noise model says that, the observed diffusion weighted signal S b is a random variable 
obtained by: 

S b = (\S b u bl +ae bl \ 2 + \S b u b2 + ae b2 \ 2 ) =\\ S b u b + ae b ||, (24) 

where u b := (u b i,u b 2) T is a unit vector in M 2 , and the random vectors e b := (e b i,e b 2) T are dis- 
tributed as N(0, I 2 ) and are assumed to be independent for different b. The parameter a > 
controls the noise level. 

We consider two regression based methods for estimating D based on the observed DWI signal 
intensities {S b : b € B}. For simplicity, throughout this section, Sq is assumed to be known and 
fixed. The first method is to use the log transformed DWI's to estimate D by a linear regression. 
The resulting estimator is referred to as the linear regression estimator and denoted by Dls- The 
second method uses nonlinear regression based on the original DWI's to estimate D. The resulting 
estimator is referred to as the nonlinear regression estimator and denoted by D^i. 

In order to define these estimators, we introduce some notations. Specifically, the tensor D 
is treated as a 6 x 1 vector with elements Dn, D 22 , -D33, D\ 2 , D\%, -D23. Under this notation, the 
quadratic form b T Db (where D is a matrix) can be rewritten as x^D (where D is a vector) with 
x b = (J) 2 , &§) 63, 26162, 26163, 2b 2 b^) T . In the following, we also assume that the matrix Ylb£B x b x "b 
is well-conditioned, which is guaranteed by an appropriate choice of the gradient directions. Then 
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the linear regression and non-linear regression estimators are denned as 

D LS := argnrin^(log5 6 - logS + x{D) 2 , (25) 

beB 

and 

D NL := argrrrin^(S' 6 - S exp(-x^)) 2 . (26) 

beB 

Note that, these estimators are used in the simulation studies under the Rician noise model (Section 
[3J to derive the observed tensor at each voxel, which is then used as input for the smoothing 
methods. 

In the following subsections, we analyze the behavior of the above estimators by varying the 
noise level, i.e., a, which corresponds to varying signal-to- noise ratio (SNR). Consistency of the 
estimators is established under the a — > setting, i.e., the "large SNR" scenario. In addition, 
we also compare the nonlinear regression estimator with the maximum likelihood estimator which 
utilizes the Rician noise model explicitly. Throughout, we shall use D° to denote the true diffusion 
tensor. 

5.2 Asymptotic analysis of D LS and D NL under large SNR 

In this subsection, we carry out asymptotic expansions of the estimates -Dls and -Djvl assuming 
that the noise level a goes to zero, while S^s are treated as fixed. Effectively, this means that, 
a <C minftggS'b, which is equivalent to assuming that So 3> a and that max^gg x\ D° is "not too 
large" . Clearly, the last condition holds if the largest eigenvalue of the tensor D° is not very large 
(note that, the gradient directions are normalized to have unit norm). 

Asymptotic expansion of D^s 

Proposition 3 As a — > 0, Dis = D° + aD\^s + ^ 2 ^ > 2,ls + Op(a 3 ) where the random vectors 
-Di.LS and -C^.ls are given by 

Di,ls = ~CE ^)" X (E w( u b £ b) x b), (27) 
beB beB b 

and 

D 2 ,ls = - {^e h ) 2 )x h ). (28) 

beB beB $b 

Remark 6 D\^s has a normal distribution with mean zero and variance 

Var(D hLS ) = Q^XbxD^Q^Sb^XbxJX^xtxJ)- 1 , (29) 

beB beB beB 

whereas £>2,ls is a weighted sum of differences of independent x?u random variables, and hence it 
also has mean 0. These show that the bias in Dls is of the order 0(er 3 ). 
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Asymptotic expansion of D^l 

Unlike Dls, there is no explicit expression for D^l- Instead, it satisfies the normal equation: 

y^^Sb - S exp(-xlD NL ))S exp(-xlD NL )x b = 0. (30) 

beB 

We have the following result. 

Proposition 4 As a — > 0, Dnl = D° + aDi nl nl + Op(a 3 ), where 

Di,nl = -(Y] S 2 b x h xl)~ l (y] S b (uje b )xb), (31) 

beB beB 

and 

D 2 ,nl = (^2 Slxbxl)' 1 
beB 



y~] S 2 b {xlDi, NL fx b + - y2( S b x b D i,NL + ule b ) 2 x b - - ^2 II £b II 2 a 
beB beB beB 

(32) 

Remark 7 D\^l has a normal distribution with mean zero and variance 

Var(D l , NL ) = (J2S 2 b x b xl)- 1 , (33) 



beB 

2 



and Z?2,A r L is a weighted sum of (dependent) x?u random variables, with mean 



E(D 2>NL ) = -hj2 s2 b x b x lr l 



E(i-^(E^'^) _1 



2 

beB IbeB \ b'eB 



x b x b 



(34) 



Observe that in {3$ , the coefficients of all the x b 's, are negative and thus the mean of D 2i nl is 
non-vanishing. Therefore, the bias in Dnl is of order 0(a 2 ). 

Remark 8 Propositions^ and\Q show that, under the Rician noise model, at relatively high SNR, 
both linear and nonlinear regression estimators of the diffusion tensor are nearly unbiased. This 
means that under such a scenario, when the regression estimators are used as input for the smoothing 
methods, the corresponding noise structure is nearly additive. 

Comparison of Dls and Dnl 

It is clear from the above expansions that both Dls an d Dnl are consistent estimators when a 
goes to zero, where Dls has a bias of order 0(<r 3 ) and that in D^l is of order 0(c 2 ). Up to the 
first order, both estimators have Gaussian fluctuations, and up to the second order, they become 
weighted sums of Gaussian and chi-squared random variables. The asymptotic analysis also points 
to an important advantage of Dnl over Dls'- the former has a smaller asymptotic variance than 
the latter. That is, 

C^slxhxl)' 1 H C^Xhxl^C^Sh^XhX^C^Xhxl)' 1 (35) 
beB beB beB beB 
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where ^ denotes the inequality between positive semi-definite matrices. This can be proved by 
using Schur complement and noticing that the matrix 

"q2 rp rp 

*~>b x b x b x b x b 

T ~q~ 2 T 

x b x b a b x b x b 

is positive semi-definite for each b € B. Indeed, the difference between the two asymptotic covariance 
matrices can be quite substantial if some S^s are significantly smaller than (1/|£>|) YlbeB This 
situation can arise when the true diffusion tensor D° has a strong directional component (i.e., 
being a highly anisotropic tensor). This is because, under such a situation, only a few gradient 
directions are likely to be aligned to the leading eigen-direction, which results in small Sb, whereas 
other gradient directions lead to large Sb- Consequently, Dls will have a high degree of variability 
and the quadratic forms b T Db associated with the gradient directions that are partially aligned to 
the leading eigen-direction will be poorly estimated. This fact has also been empirically verified 
through simulation studies (results not reported here). 



5.3 Comparison of MLE and D NL under large SNR 

The probability density function of the Rician noise distribution with signal parameter £ (repre- 
senting the amplitude of the signal) and noise parameter a is given by (cf. Polzehl and Tabelow, 
2008) 

x / x 2 + £ 2 \ f X C\ 

p<A x ) = ex P ( — ^pr~ ) 7 ° ( ^2 J 1(x > °)> ( 36 ) 

where Iq denotes the zero-th order modified Bessel function of the first kind. Under the Rician 
noise model (f23|) and (pH)) . the measurements SVs are independent with p.d.f. p^moM j, where D° 
is the true tensor and 

C(D,b):=S exp(-xlD) (37) 

so that S b = ((D°,b). 

We first compute the Fisher information matrix for the Rician noise model with respect to D. 
In the following, for simplicity, a is treated as known. 

Proposition 5 Under model A23\) and \2J$ , for a given a > 0, the Fisher information matrix with 
respect to D at D° based on the data {Sb ■ b G B}, is 



l(D°;a 



^J(CP°,6);a)(C( J D ,6)) 2 x fe xf 

6GB 

XbXb 

beB 



Here a) is the Fisher information for the parameter £ of the Rician distribution {3 
can be expressed as 



( M - 4t 



where, for w > 0, 

V{w) :-- 



' 3 (h(uw))- 
I (uw) 



1 poo 
^ 4 JQ 



Io(v) 



V. 



(38) 
and it 
(39) 

(40) 
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Remark 9 There is no closed form expression for V(w) . However, since both I\ and Iq are mono- 
tone increasing C°° functions with subexponential growth on [0, oo), this function can be evaluated 
for any given w by numerical integration using an appropriate quadrature method. 

Let Dml denote the maximum likelihood estimator of D. Then Dml satisfies the likelihood 
equation 

E V1 °g^(^)|c=C(B Mi ,6) =°> ( 41 ) 

where Vlogp^m ^),a{ x ) denotes the gradient of log p^jj^b), a ( x ) with respect to the parameter D. 
Then, similarly as in the case of Dnl, it can be shown that, as a — > 0, Dml — D AfL+op(°")j 
where 

E(D 1ML ) = 0, and Var(D 1)M L) = Q^slx^)- 1 . (42) 

beB 

This implies that the asymptotic variance of Dml is f 2 (X)&&B ^bX^xJ)' 1 + o(a 2 ). 

It can be checked that for every fixed the quantity a 2 o~) is less than 1 for all a > and 
converges to 1 as a — > 0. From this, it follows that 



a 2 l(D°; a) -> ^ S 2 b x b xl as a -> 0. (43) 



b£B 



Since a = o(l), by Proposition gj we have Djvl = D° + <jD 1jNL + O p (ct 2 ). Using (J33D and (g2]) it 
can be concluded that Dnl has the same asymptotic variance as the MLE when a — > 0. In other 
words, under the large SNR regime, the nonlinear regression estimator Dnl is asymptotically as 
efficient as the MLE Dml- Moreover, (|43p shows that when scaled by a 2 , the (common) asymptotic 
covariance matrix of Dnl and MLE is the limit of the inverse of the Fisher information matrix of 
D. 

5.4 Bias in D L s under small SNR 

Polzehl and Tabelow (2008) discuss in detail the issue of bias in estimating tensors from the raw 
DWI data. They show that the raw DWI signal is a biased estimator of Sb and the bias becomes 
larger at smaller SNR. Specifically, they derive an expression for the bias which is quoted below. 

Let B(xjD°,a) := E D o t(T (S b ) - S Q e~ x b D ° denote the bias in S b as an estimate of S b when 
(D°,o~) is the true parameter. Then 

— / c2 — 2x T D°\ 

r L ^(- 2 J )-S e-< D ° (44) 

where Li/ 2 {x) = e x / 2 [{l— x)Iq(— x/2)— xl\{— x/2)], and I u is the u-th order modified Bessel function 
of the first kind (cf. Abramowitz and Stegun, 1965). 

In this subsection, we investigate the effect of the noise on Dls- For simplicity, we consider 
an idealized framework where the "informative" gradient directions can be separated from the 
"noninformative" ones. 

Suppose that there is a direction b € B such that xjD° is large in the sense that, Sb = 
Soexp(—xjD°) <^ a, even though a So. In that case, the analysis presented in Section ED 
is not valid since the Taylor expansions adopted there is not appropriate due to the explosive 
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behavior of the higher order derivatives of the associated functionals. Instead, we consider the 
following approximation: 



log S b - log S 



, n -t t d° ° 
log || e u b + —e b 



1 



log 



log 



C2 II Sb 
^0 



S 2 o 

Sq 
a 



So 

2 +2^-e~ x i 



o 



e b II 2 +Sl/v 2 \ 



u b e b + e 



+ - log 1 + 2 



S b 



+ ^ log(|| e b || 2 +S 2 b /a 2 ) + 



S b 



T 

H e b 



£b 
Eb 



\ 2 +st/^J 

ii T 

e b 



° || e b || 2 +Sb/a 2 II £«. 



+ P 



II £b II 

^ 2 (|| £fe P+s'/a 2 ) 2 , 



(45) 



Under the Rician noise mode (|24p . ~ JV(0, J2); and so || || 2 has the Exponential distribution 
with mean 2. Moreover, the third term on the RHS of (|45|) has mean 0. Thus this expansion clearly 
indicates that under the regime where S b <C cr, the linear regression estimator Dls will be biased . 

To make the argument more concrete, we consider the scenario where the gradient set B can 
be partitioned into Bo U Bq such that, for b G ,8o> = an d f° r & G ^ci' ^6 = °( CJ )- 

Then the linear regression estimator .D^s can be expressed as follows. Define S* := min& e B/S'&, 
S* := maxjggjj S fe and 5** := min feeBo Sb- Then, 



D L s = -C^x b xl) 1 ( s ^(\ogS b -\ogSo)x b ) 



b&B 



b&B 



beB 



XbXb ) 



bee fee B b 



x b x b) 



b&B 



lo s (v) £ X6 - 5 £ log(l1 £b f + ^ /a 



5';, 



T 

Eb 



beB * ° || Eb || 2 +^/a 2 II e b 



•?'6 



+ Op max{ =* , — 



(7 



5 



max ■ 



Eb 



* J beB S (|| e b ||2 +(5«/ ( r) 2 ) 2 



} 



The above expression can be decomposed as the summation of the bias and variance terms up to 



23 



the first order, which is more easily interpretable: 



D LS -D° = (J> 6 a£)- 



beB 



ij2 x b*b)D° + E( lo g(v } - 5 108(11 £h l|2 



beBR 



best 



beB 



Sb 



£b 



T 

u b e b 



beB 



beBZ 



° || e b || 2 +S z b /a 2 II ?b 



Xb 



+ P [ max{ ( — 



a 



~Fi \ 2 



max 



£b 



S * 



° J beBf, (|| £b || 2 +(5«/a) 2 ) 2 



} • 



(46) 



The second term on the RHS has mean zero and has the major contribution in the variability of 
Dls, while the first term primarily contributes to its bias. 
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Appendix 

Perturbation analysis of tensor means 
Proof of Proposition [TJ 

First consider case (a): the eigenvalues of S are distinct. Let ^j(w) denote the j-th largest eigenvalue 
of S(uj) and Qj(io) denote the corresponding eigen-projection. Then under (fl~5|) . for t sufficiently 
small, with probability 1, /Uj(o;) is of multiplicity 1, and so Qj{u) is a rank 1 matrix. We can then 
use the following matrix perturbation analysis results: 

Hj(u) = Xj + tT(PjB(u)) - tiiPjB^HjBiu)) + 0(C 2 \\ B ||^J, (47) 

where tr denotes the trace of a matrix, and 

Qj{u) = Pj - {P j B{u)H j + HjB^Pj) 

+P j B(uj)H j B(uj)H j + HjB(bj)PjB(uj) Hj + HjB (u)HjB (uj)Pj 
-PjB^PjB^Hf - PjB^HfB^Pj - H]B{u))PjB(u)Pj + 0{C 3 || B ||^)(48) 

where || B ||oo:= sup w6n || B(u) ||. 

In order to compare the Euclidean mean with the mean defined in terms of the log-Euclidean 
metric, we consider an asymptotic expansion of log5(w) around log S, where the asymptotics is as 
t -> 0. 
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By definition, and using (|15p . for t sufficiently small, 
log S(uj) — log S 

N 

= 5^(logMj(w)QjH - logAjPj) 

AT ^ AT 

= J^log^w) - logAjJPj +X)bgAj(Qj(w) - Pj) + E( 1( wM - logA i )(g J -(w) - P,) 
j=i j=i j=i 

(49) 



Now, from (|4T|) and (|TB]). we have 

log ^M- log A,- = log (l + ^ ~ Aj ) 



= ^[tr(P,P(u;)) - tr(P i P(a;)P; i P(a;))] - ^[tr(P,-B(a;))] 2 + 0(t 3 ), 

^ 3 

where we have used the series expansion log(H-sc) = E^LiC- 1 )"" 1 ^/"- Substituting in ggj and 
using ([35]). we get, 

log 5 -log S 

N N N 

= £ j:to(PjB("))Pj ~ E yto( p j B (<") H i B (u)) p i ~ 2 E ^[tr(P,sM)] 2 P,- 

i=i J j=i J i=i •? 

AT 

- ^ log Xj [PjB(u)Hj + HjBMPj] 

3=1 
N 

+ J]logA i [Pj B{uj)HjB(u) Hj + HjB(cj)PjB(ui)Hj + BjB(co)BjB (cj)Pj 

3=1 

-PjB(u})PjB(u)H] - PjB{u)HlB(uj)Pj - BjB^PjB^Pj] 
- 1 

- E T-tr(P ? P(a;))(P j P(w)P j + HjB{u)Pj) + 0{t 3 ). 

3=1 Xj 

From this, we get (|17p by taking expectation, and recalling that E(P) = O. 
Now consider case (b): S = X\I. Then, from the expansion 

logS(u;) = (log Ai)J + log + = logS + ^B(u)-^(B(oj)) 2 + 0(C 3 \\B\\l), 



we obtain (jlol) by taking expectation. 



Proof of Proposition [2] 

Define, 



L(u) = \og{Sl)fs(u)S- A )f), .efi. 
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Note that, implies that E(L) = 0. First, we show that 



L 1(00:= sup || L ||= 0(t). (50) 



This implies that 

a/2 T, ~l/2 



5 = E(5) = nS2ff e SAff 

' 2 



SAff + S%E{L)S% + \s%E{L 2 )S% + 0(t 3 ) 



from which we get 

S Aff = S- ^S 1/2 E(L 2 )S 1/2 + 0(t 3 ), (51) 

where, we have used the fact that E(L) = 0. Now, we express E(L 2 ) in terms of an expectation 
involving B = S — S. In order to do this, observe that 



B{uj) = S(u)-S = S)(] f e L ^s)l] f -S Af f + S Af f-S 

= S%L{u)S% + l -S%{{L{u)) 2 - E{L 2 ))S% + 0(i 3 ) 
= S l ^L{u)S 1 ' 2 + ^/ 2 ((LH) 2 - E{L 2 ))S 1 ^ 2 + 0(t 3 ), 



where, in the second and third steps we have used (|50j) and (|5Tj) . Therefore, again using (1501) . 

E(L 2 ) = S^^EiBS^B)^ 2 + 0{t 3 ), (52) 
which, together with (151 p . leads to the representation 

S Aff = S-^E(BS~ 1 B) + 0(t 3 ). (53) 

Now we are in a position to complete the proof. For case (a), using similar perturbation analysis 
as in the expansion of logs' (a;) around logS, and then using (|53p . which in particular implies that 
|j § Af f - S ||= 0{t 2 ), we obtain (|2D|> . 

For case (b), since S = X\I, from (|53p . and using series expansion of log(/ + K), where K is 
symmetric, (|2ip immediately follows. 



Proof of (|5D]> 

From the fact that || B ||oo= O(i), it easily follows that E{d 2 A ^{S , S)) = 0(t 2 ). Then, by definition 
of SAff it follows that E(d 2 Aff {S Aff ,S)) = 0{t 2 ) which implies E(d Aff (S Aff , 5)) = 0(t). Now, 
writing 

log(S^f S(u,)S^ 2 ) = ^{B^fSB^f + S-)f B^)S A )f) 

and using the Baker-Campbell-Hausdorff formula, together with the fact that || B ||oo= 0(t), we 
conclude that 

d A ff(S A ff,S) = 0(t) so that || S A ff — S \\= Oit). 
From this it is easy to deduce (f50|) . 
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Rician noise model 

Facts about Bessel functions 

The following result about Bessel functions (cf. Abramowitz and Stegun, 1965) is crucial in proving 
results involving the Rician noise model. 

Lemma 1 For every v, and t 7^ 

2v 



/.+i(i) = U(t) 



t 



-J„(t) and l' u {t)=I v _ l {t)-^I v {t) = I u+l {t) + ^I u {t), 



(54) 



where I' v (t) denotes the derivative of I v (t) with respect to t 

We list down some facts about the function 

h(t) 



F(t) :- 



Io(t)' 



t € 



(55) 



that we have used in deriving the asymptotic expansion for the MLE of D in the limit a 
1. Using (|53j) . we have a differential equation for F, 

1 



0. 



(56) 



F'(t) = 1 - -F(t) - (F(t)) 2 for t + 0. 

2. Fit) -)■ 1 and t(l - F(t)) ->• 1/2 as t -> oo. 

3. Above facts, together with (JS5D imply that tF'(t) -> and £F"(i) -)• as t -> oo. 

Proof of Proposition [5] : Let V logjam 6) j(T (x) denote the gradient of logp^m m CT (a;) with respect 
to the parameter D. Using the chain rule of differentiation, we have 



d 

Vlogp c(D)6))(7 (a;) = — \ogp Cy(T {x) 



d 

\(=C(D,b) ^D((D,b) = — logp Cj(J (x) \t=c{D,b) ■ (-((D,b))x b . 



(57) 

It is enough therefore to find the Fisher information for £ under the Rician noise model with 
parameter (Cj°~) (with a known). Now, it follows from (154p that ig(i) = Ii(t). Thus, 



9 l ^ 

— log P<>a {x) 



C 



^ + cj2 In 



(58) 



Since logp^ j(J (A)) = 0, from (I58p it follows that the Fisher information for £ is given by, 



:J(C;ct) :=Var(^logp Ci(T (A)) 



E 



2£C 



in 



_ C 

e 



(7" 



3(/l(§ 



' -* 2 c 2 

-e z^dx j 

O" 4 



e 2^ 



(7 



(9 


C 2 1 







(59) 



where, U(u>) is given by (|40j) . 
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Outline of asymptotic expansions of D^g an( l Dnl 

Define, for an arbitrary tensor D and arbitrary gradient direction b, 

f b (w,D) :=|| S e-^ D u b + w\\ . 



(60) 



Then, f b (0,D°) = S b and f b (creb, D°) = S b . We denote the partial derivatives with respect to the 
first and second arguments by Vijf b etc, where 1 < i,j < 2. Then, 



1 



fb(w,D°) 



(S b u b + w) => Vif b (0,D°) = ^-S b u b = u b . 



(61) 



Now, let v b € M 2 be such that v^u b = and || v b ||= 1 so that u b uj + w^jf = ^2- Thus, using (|6ip . 



Vufb(w,D c 



(S b u b + w)(S b u b + w 



-D°) z (f h (w,D )) 3 
Vnfb^D ) = ^(h- u b u T b ) = ^v b vf. 

D b Ob 



We also have, 
and 



V 2 f b (0,D°) = -S exp(-xjD°)x b = -S b x b 

V 2 2/f>(0,£>°) = S exp(-xlD°)x b xl = S b x b x b '. 
Proof of Proposition [3] : 

Dls = -C^XbxlrH^log S h - log S )x b ) 

beB beB 



(62) 

(63) 
(64) 



fceB 



beB 



6eB 



/ 6 (0,£>°) 



e^Vii/ 6 (0,D )e 6 (e£'ViA(0, D u )) 



0^2 



fb(P,D°) 



beB beB 
This, together with (|6ip and (j62j) . proves Proposition [3j 



(A(o,£ )) 2 



X 6 + Op(c7 3 ). 



Proof of Proposition [4] : From this (|30p . and the definition of Dnl, using standard arguments, 
it can be shown that || D^i — D° \\= Op(a) as a — > 0. Then, expanding the LHS of (|30p in Taylor 
series, we have 



beB 



<7<#Vi/ & (0,L>°) + V#Vii/6(0,£>°)e 6 - (V 2 / 6 (0, D»)) T (D NL - D") 



(j- 



0\\T( 



--(D NL - D°) T V 22 f b (0,D°)(D NL - D°) 



V2fb(0,D ) + V 22 fb(0,D )(D NL -D )) = P (a 
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Now, changing sides and collecting terms corresponding to a given power of <r, we can express D^l 
as D NL = D° + aD 1>NL + a 2 D 2 ,NL + P (a 3 ), where 

D 1 , NL = (^V 2 / b (0, J D )(V 2 / b (0, J D )) T )- 1 (^ e ^V 1 / fe (0, J D )V 2 A(0, J D )) 
beB beB 



which equals (|3Tj) by virtue of (|63|) . Next, we have 
^2,tvl = (^V 2 / 6 (0, J D°)(V 2 A(0, J D )f)- 1 - 



beB 



\ £ (^V u / 6 (0, £>°)e& - Z^ L V 22 A(0, D°)D ltNL ) V 2 / 6 (0, D°) 



+ ^(^Vi/ 6 (0, J D ))V 2 2/ 6 (0, J D ) J Di,ArL - ^(^jv L V 2 A(0, D°))V 22 / b (0, .D )^, 
6GB 6es 

Now, using (|6ip -([Mj) and (|3ip . we can simplify the expression for D 2 jvl as 



2,NL 



C^2s 2 b x b xl) 1 



6eB 



3 2 1 



NL)Xb 



beB 



beB 



beB 



which can be rearranged to get ([32 
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Figures for Rician noise model 



linear regression input, sig=0.1— whole set 



linear regression input, sig=0.1— bands crossing 
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linear regression input, sig=0.1— background interior 



linear regression input, sig=0.1 — bands interior 
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linear regression input, sig=0.1— background boundary 
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Figure 3: Rician noise with a = 0.1. Comparison of median errors over different regions for 
"observed" tensors (obtained by linear regression, represented by black circle) and Euclidean 
(red triangle), log-Euclidean (green +) and Affine smoothers (blue x). 
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nonlinear regression input, sig=0.1— whole set 



nonlinear regression input, sig=0.1— bands crossing 
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nonlinear regression input, sig=0.1 — background boundary 
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Figure 4: Rician noise with a = 0.1. Comparison of median errors over different regions for 
"observed" tensors (obtained by nonlinear regression, represented by black circle) and Euclidean 
(red triangle), log-Euclidean (green +) and Affine smoothers (blue x). 
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linear regression input, sig=0.5— whole set 
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Figure 5: Rician noise with a = 0.5. Comparison of median errors over different regions for 
"observed" tensors (obtained by linear regression, represented by black circle) and Euclidean 
(red triangle), log-Euclidean (green +) and Affine smoothers (blue x). 
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Figure 6: Rician noise with a = 0.5. Comparison of median errors over different regions for 
"observed" tensors (obtained by nonlinear regression, represented by black circle) and Euclidean 
(red triangle), log-Euclidean (green +) and Affine smoothers (blue x). 



33 



linear regression input, sig=1— whole set 



linear regression input, sig=1— bands crossing 



A f^v r\ /*s kx 

vfk U U U 52 U 

A A * 
A 

A A 


median error 
1.0 2.0 3.0 4.I 


+ + 

x X 

X X 

* 8 £ o R R 


1 2 3 4 5 6 
0.005 (0.005,0.01) 0.01 (0.01,0.01) (0.01,0.025) 0.025 

bandwidth 

linear regression input, sig=1— background interior 




i i i i i i 

1 2 3 4 5 6 
0.005 (0.005,0.01) 0.01 (0.01,0.01) (0.01,0.025) 0.025 

bandwidth 

linear regression input, sig=1— bands interior 


& 

i * * 

A A 


median error 

I 2 3 4 5 


+ + 
+ + X 

* x x x 

* g O O O 

A A a A 


I I I I I I 

0.0 1 05 (0.005,0.01) oil (0.01*0.01) (0.01,0.025) 0.§25 
bandwidth 

linear regression input, sig=1— background boundary 




ill 

O.do5 (0.005,0.01) 3 01 (0.01 4 0.01) (0.01 $.025) 0.§25 
bandwidth 

linear regression input, sig=1— bands boundary 


+ ^ 
X X 

$ 

* * 

A a * 

A A A 


median error 
I 2 3 4 5 


+ 

, + + x + 
XXX x 

* g O O O 

A A a A 


I I I I I I 

0.rj05 (0.005,0.01 ) OBI (0.01,0.01 ) (0.01 ,0.025) 0.§25 




0.do5 (0.005,0.01) 3 01 (0.01 4 0.01) (0.01 $.025) 0.§25 



Figure 7: Rician noise with a = 1. Comparison of median errors over different regions for 
"observed" tensors (obtained by linear regression, represented by black circle) and Euclidean 
(red triangle), log-Euclidean (green +) and Affine smoothers (blue x). 
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Figure 8: Rician noise with a = 1. Comparison of median errors over different regions for 
"observed" tensors (obtained by nonlinear regression, represented by black circle) and Euclidean 
(red triangle), log-Euclidean (green +) and Affine smoothers (blue x). 
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Figures for spectral noise 
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Figure 9: Spectral noise with v = 50 and rj = 0.1. Comparison of median errors over different 
regions for observed tensors (black circle) and Euclidean (red triangle), log-Euclidean (green +) 
and Affine smoothers (blue x). 
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nu=50, eta=0.2— whole set 



nu=50, eta=0.2— bands crossing 
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Figure 10: Spectral noise with v = 50 and r/ = 0.2. Comparison of median errors over different 
regions for observed tensors (black circle) and Euclidean (red triangle), log-Euclidean (green +) 
and Affine smoothers (blue x). 
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nu=50, eta=0.3— whole set 



nu=50, eta=0.3— bands crossing 
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Figure 11: Spectral noise with v = 50 and r/ = 0.3. Comparison of median errors over different 
regions for observed tensors (black circle) and Euclidean (red triangle), log-Euclidean (green +) 
and Affine smoothers (blue x). 
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nu=20, eta=0.1— whole set 



nu=20, eta=0.1 —bands crossing 
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Figure 12: Spectral noise with v = 20 and 77 = 0.1. Comparison of median errors over different 
regions for observed tensors (black circle) and Euclidean (red triangle), log-Euclidean (green +) 
and Affine smoothers (blue x). 
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Figure 13: Spectral noise with v = 20 and r/ = 0.2. Comparison of median errors over different 
regions for observed tensors (black circle) and Euclidean (red triangle), log-Euclidean (green +) 
and Affine smoothers (blue x). 
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nu=20, eta=0.3— whole set 



nu=20, eta=0.3— bands crossing 
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Figure 14: Spectral noise with v = 20 and 77 = 0.3. Comparison of median errors over different 
regions for observed tensors (black circle) and Euclidean (red triangle), log-Euclidean (green +) 
and Affine smoothers (blue x). 
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